Exploring Large Document Collections with a Dynamic Hierarchy

نویسندگان

  • Alexander Gelbukh
  • Grigori Sidorov
چکیده

We present a method for dealing with the large document collections that takes into account the user needs in a better way. The user can choose the desired granularity of the hierarchy and then apply this hierarchy to document collection for classifying the documents. The granularity depends directly on the number of clusters. The clusters are presented to the user in two different ways: (1) as a representative document of the cluster; (2) as a set of keywords characterizing all documents of the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Large Digital Library Collections Using a Map-Based Visualisation

In this paper we describe a novel approach for exploring large document collections using a map-based visualisation. We use hierarchically structured semantic concepts that are attached to the documents to create a visualisation of the semantic space that resembles a Google Map. The approach is novel in that we exploit the hierarchical structure to enable the approach to scale to large document...

متن کامل

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Exploration of Text Collections with Hierarchical Feature

Document classiication is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classiication techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classiicat...

متن کامل

Learning Topic Hierarchies and Thematic Annotations from Document Collections

Large textual and multimedia databases are now widely available but their exploitation is restricted by the lack of metainformation about their structure and semantics. Many such collections like those gathered by most search engines are loosely structured. Some have been manually structured, at the expense of an important effort. This is the case of hierarchies like those of internet portals (...

متن کامل

Summarization of Changes in Dynamic Text Collections

Information Retrieval is the Informatics field primarily focused on all problems and challenges related to information storage and access. The large majority of works in this area are based on static collections of documents. However, many of these collections are dynamic, and have evolved over time with documents being added, edited or simply removed at different times. Even in highly dynamic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005